library(e1071) # to understand skewness
library(dplyr)
library(stringr) # Used to rename the columns by removing the word team from the column header
library(VIM) # To understand NAs
library(caret)
library(mice)
library(MASS) # to use for robust Linear Regression.
# browse to the data
moneyball = read.csv('/Users/legs_jorge/Documents/Data Science Projects/MSDS_Northwestern/MSDS 411/Unit 01 Moneyball Baseball Problem/Data/moneyball.csv', header = T)
colnames(moneyball) <- str_replace_all(colnames(moneyball),"TEAM_","") %>% 
  tolower() # Fixing column names

Introduction

The moneyball dataset has sparked many companies, teams, and organizations to understand and utilize the data they generate/gather. This project highlights many pitfalls that those same individuals fall into simply because they forgot to do the due diligence and prepare the data before modeling.
This paper will focus on;
1. Data Exploration
2. Data Transformation
3. Model Building
4. How to select the best model

Data Exploration

Step 1: Can we find outliers in our Independent and Dependent variables?

Outliers can cause our model to produce the wrong output by influencing its fit. Creating boxplots will aid in identifying those outliers. We can also use the cleveland dotplot to understand the outliers better. This technique uses the row number against actual value to quickly point out any patterns of outliers. This plot will easilly allow us to check the raw data for errors such as typos during the data collection phase. Points on the far right side, or on the far left side, are observed values that are considerably larger, or smaller, than the majority of the observations, and require further investigation. When we use this chart, together with the box plot and histogram, we can easily identify patterns at to where in the data we’re seeing outliers.

par(mfrow = c(1, 3))
i = 2
while (i %in% c(2:17)) {
 
plot(moneyball[,i], moneyball$index, xlab = colnames(moneyball)[i] , ylab = "Index", main = paste("cleveland dotplot of ",colnames(moneyball)[i]))
boxplot(moneyball[,i], col = "#A71930", main = paste("Boxplot of ",colnames(moneyball)[i]))
hist(
  moneyball[,i],
  col = "#A71930",
  xlab = colnames(moneyball)[i],
  main = paste("Histogram of ",colnames(moneyball)[i])
)
  i = i + 1
}

It looks like the outliers are legitmate and we will try Spatial Sign transformation to deal with them.

Now that step one is done, let’s look at step 2.

Step 2: Are the data normally distributed?

From the historgram above we can clearly see that the data is not normal, with the exception of some that seems to sort of follow a normal distribution. Let’s use QQ-plot to test each column for normality, while adding a histogram and a Skewness number.
- If skewness is less than −1 or greater than +1, the distribution is highly skewed.
- If skewness is between −1 and −½ or between +½ and +1, the distribution is moderately skewed.
- If skewness is between −½ and +½, the distribution is approximately symmetric.

par(mfrow = c(2, 2))
i = 2
while (i %in% c(2:17)) {
  qqnorm(moneyball[,i], main = paste("QQ-Plot of ",colnames(moneyball)[i]));qqline(moneyball[,i], col = 2)
  
  hist(
  moneyball[,i],
  col = "#A71930",
  xlab = colnames(moneyball)[i],
  main = paste0("Skewness = ",skewness(moneyball[,i]))
)
  
  i = i + 1
  
}

We would need to try certain transformation to correct for Skewness, with Box-Cox being the number one choice.

Step 3: Are there lots of NAs in the data?

R gives us a lot of ways to understand the distribution of Nulls within the data. Let’s first try to calculate the percentage of Null values to the total number of observation.

NAPerc <-
  sapply(moneyball, function(x)
    (sum(is.na(x)) / length(x)) * 100) %>%
  data.frame()
NAPerc$Column <- rownames(NAPerc)
colnames(NAPerc) <- c("NA_Perc", "Col_Name")
# Trying to understand the percentage of NAs per Column
NA_col <- subset(NAPerc, NA_Perc > 0) %>% arrange(desc(NA_Perc))
NA_col

Let’s look at the pattern of missing data to try to get more insights. It’s clear that batting_hbp is going to be a problematic column with 92% of the data missing. Before we start the imputation or deleting variables, let’s try to understand why we have missing data.

Let’s use the mice package to help us understant how all the NAs behave in the data. mice provides a handy function called md.pattern that allows one to understand the pattern of missing data. Hopefully by looking at the pattern, we can have an idea on why the data could be missing.

md.pattern(moneyball) %>% data.frame()

The first column of the output shows the number of unique missing data patterns. There are 191 observations with nonmissing values, and there are 1295 observations with nonmissing values except for the variable batting_hbp. The rightmost column shows the number of missing variables in a particular missing pattern. For example, the first row has no missing value and it is “0” in the row. The last row counts the number of missing values for each variable. For example, the variable pitching_bb contains no missing values and the variable batting_so contains 102 missing values. This table can be helpful when you decide to drop some observations with missing variables exceeding a preset threshold.

After careful analysis, the decision is to keep batting_hbp. Because I want to transform it into a binary variable, I will keep it out until all th eimputation is done.

batting_hbp_bi <- if_else(is.na(moneyball$batting_hbp),0,1)
batting_hbp <- moneyball$batting_hbp
moneyball_trans <- subset(moneyball, select = -c(batting_hbp))

Let’s impute and treat the data for missing values before testing it for multicollinearity.

The mice package will be the package used to help us with this task. Since we only have numeric values, mice will automatically chose PMM (Predictive Mean Matching) as the method. A great resource to understand this techinique is found here.

Let’s add batting_hbp back into the data.

moneyball_imp$batting_hbp <- batting_hbp
moneyball_imp$batting_hbp_bi <- batting_hbp_bi

Step 4: Is there collinearity among the covariates?

Let’s create a series of correlation matix to understand how each independent variable interacts with the dependent variable. This correlation matix will help us spot any infrigement of the assupmtions needed to develop a robust OLS model, namely multicollinearity. The caret package can help the user find those pairs and even suggest which one to remove.

The Caret package offers the findcorrelation(), which takes the correlation matrix as an input and finds the fields causing multicollinearity based on a threshold, the cutoff parameter. It in turns returns a vector with values that would need to be removed from our dataset due to correlation.

colnames(moneyball_imp)[findCorrelation(cor(moneyball_imp))]
[1] "batting_hr"

Data Transformation

Let’s introduce new variables through transformation:

  1. batting_1B = batting_h-(batting_2b + batting_3b + batting_hr)
  2. free_bases_num = batting_hbp + batting_bb
  3. total_bases = batting_1B + 2 * batting_2b + 3 * batting_3b + 4 * batting_hr + batting_bb + batting_hbp + baserun_sb
  4. total_bases_allowed = pitching_bb + 4 * pitching_hr + pitching_h
  5. HR_over_OP = batting_hr - pitching_hr
  6. walks_over_OP = batting_bb - pitching_bb
  7. SO_over_OP = pitching_so - batting_so
moneyball_imp$batting_1B <- moneyball_imp$batting_h-(moneyball_imp$batting_2b + moneyball_imp$batting_3b + moneyball_imp$batting_hr)
moneyball_imp$free_bases_num <-  if_else(is.na(moneyball_imp$batting_hbp),0,as.numeric(moneyball_imp$batting_hbp)) + moneyball_imp$batting_bb
moneyball_imp$total_bases <- moneyball_imp$batting_1B + 2 * moneyball_imp$batting_2b + 3 * moneyball_imp$batting_3b + 4 * moneyball_imp$batting_hr + moneyball_imp$batting_bb + if_else(is.na(moneyball_imp$batting_hbp),0,as.numeric(moneyball_imp$batting_hbp)) + moneyball_imp$baserun_sb
moneyball_imp$total_bases_allowed = moneyball_imp$pitching_bb + 4 * moneyball_imp$pitching_hr + moneyball_imp$pitching_h
moneyball_imp$HR_over_OP = moneyball_imp$batting_hr - moneyball_imp$pitching_hr
moneyball_imp$walks_over_OP = moneyball_imp$batting_bb - moneyball_imp$pitching_bb
moneyball_imp$SO_over_OP = moneyball_imp$pitching_so - moneyball_imp$batting_so
# make alist of predictors and format them
colnames(moneyball_imp)
 [1] "index"               "target_wins"         "batting_h"           "batting_2b"          "batting_3b"         
 [6] "batting_hr"          "batting_bb"          "batting_so"          "baserun_sb"          "baserun_cs"         
[11] "pitching_h"          "pitching_hr"         "pitching_bb"         "pitching_so"         "fielding_e"         
[16] "fielding_dp"         "batting_hbp"         "batting_hbp_bi"      "batting_1B"          "free_bases_num"     
[21] "total_bases"         "total_bases_allowed" "HR_over_OP"          "walks_over_OP"       "SO_over_OP"         
pred_list <-
  "index + target_wins + batting_h + batting_2b + batting_3b + batting_hr +
batting_bb + batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr +
pitching_bb + pitching_so + fielding_e + fielding_dp + batting_hbp + batting_hbp_bi +
batting_1B + free_bases_num + total_bases + total_bases_allowed + HR_over_OP + walks_over_OP + SO_over_OP"
#keep the new variables in a vector for texting later, in cae they don't prove to be of any value.
new_var <- c("batting_1B","free_bases_num","total_bases","total_bases_allowed","HR_over_OP","walks_over_OP","SO_over_OP")

Now that we have imputed and created new variables, let’s look at the correlation matrix to understand the correlation between the variables and the traget_wins

moneyball_imp <- subset(moneyball_imp, select = -c(batting_hbp))
cor(moneyball_imp)
                           index  target_wins    batting_h   batting_2b   batting_3b   batting_hr  batting_bb  batting_so
index                1.000000000 -0.021056435 -0.017920241  0.011183013 -0.005814683  0.051481047 -0.02656724  0.08519647
target_wins         -0.021056435  1.000000000  0.388767521  0.289103645  0.142608411  0.176153200  0.23255986 -0.03784054
batting_h           -0.017920241  0.388767521  1.000000000  0.562849678  0.427696575 -0.006544685 -0.07246401 -0.42669216
batting_2b           0.011183013  0.289103645  0.562849678  1.000000000 -0.107305824  0.435397293  0.25572610  0.18629939
batting_3b          -0.005814683  0.142608411  0.427696575 -0.107305824  1.000000000 -0.635566946 -0.28723584 -0.67142084
batting_hr           0.051481047  0.176153200 -0.006544685  0.435397293 -0.635566946  1.000000000  0.51373481  0.72695383
batting_bb          -0.026567236  0.232559864 -0.072464013  0.255726103 -0.287235841  0.513734810  1.00000000  0.38595534
batting_so           0.085196474 -0.037840540 -0.426692156  0.186299393 -0.671420839  0.726953830  0.38595534  1.00000000
baserun_sb           0.031624516  0.104456513  0.133389886 -0.200831439  0.531847293 -0.500564937 -0.33797164 -0.30217480
baserun_cs          -0.021998669  0.033689086  0.071021479 -0.301507441  0.616615872 -0.629918304 -0.34803753 -0.42579881
pitching_h           0.017103148 -0.109937054  0.302693709  0.023692188  0.194879411 -0.250145481 -0.44977762 -0.36289699
pitching_hr          0.050985897  0.189013735  0.072853119  0.454550818 -0.567836679  0.969371396  0.45955207  0.66970706
pitching_bb         -0.015287513  0.124174536  0.094193027  0.178054204 -0.002224148  0.136927564  0.48936126  0.05309569
pitching_so          0.056360521 -0.074521077 -0.236077028  0.077734017 -0.263986375  0.194797938 -0.01148177  0.42152734
fielding_e          -0.009233126 -0.176484759  0.264902478 -0.235150986  0.509778447 -0.587339098 -0.65597081 -0.58183930
fielding_dp          0.010177677 -0.065095861  0.053058417  0.301414025 -0.409214014  0.478501657  0.33853564  0.25922140
batting_hbp_bi       0.047332196  0.002610647  0.019594018  0.361922796 -0.265544426  0.392199209  0.10305838  0.39651793
batting_1B          -0.047074417  0.217430135  0.827584756  0.087009889  0.600399234 -0.497294855 -0.35312165 -0.74207113
free_bases_num      -0.019063695  0.228098279 -0.068377971  0.297591911 -0.316009005  0.553966941  0.99101046  0.42997516
total_bases          0.025173504  0.481052452  0.641416724  0.704060978  0.038577619  0.593742440  0.54428557  0.20865100
total_bases_allowed  0.023268954 -0.059959123  0.314205398  0.119290484  0.092039617 -0.062551344 -0.30004852 -0.23084278
HR_over_OP          -0.000553440 -0.060991072 -0.322055891 -0.099453882 -0.243354524  0.074559388  0.19441460  0.19623665
walks_over_OP       -0.004745951  0.052184113 -0.162824365  0.011599182 -0.231156161  0.266798215  0.27356493  0.25533657
SO_over_OP           0.019397099 -0.063151948 -0.046239500 -0.007736561  0.045697731 -0.149785463 -0.20615010 -0.03681741
                     baserun_sb  baserun_cs  pitching_h pitching_hr  pitching_bb  pitching_so   fielding_e  fielding_dp
index                0.03162452 -0.02199867  0.01710315  0.05098590 -0.015287513  0.056360521 -0.009233126  0.010177677
target_wins          0.10445651  0.03368909 -0.10993705  0.18901373  0.124174536 -0.074521077 -0.176484759 -0.065095861
batting_h            0.13338989  0.07102148  0.30269371  0.07285312  0.094193027 -0.236077028  0.264902478  0.053058417
batting_2b          -0.20083144 -0.30150744  0.02369219  0.45455082  0.178054204  0.077734017 -0.235150986  0.301414025
batting_3b           0.53184729  0.61661587  0.19487941 -0.56783668 -0.002224148 -0.263986375  0.509778447 -0.409214014
batting_hr          -0.50056494 -0.62991830 -0.25014548  0.96937140  0.136927564  0.194797938 -0.587339098  0.478501657
batting_bb          -0.33797164 -0.34803753 -0.44977762  0.45955207  0.489361263 -0.011481766 -0.655970815  0.338535639
batting_so          -0.30217480 -0.42579881 -0.36289699  0.66970706  0.053095691  0.421527336 -0.581839303  0.259221402
baserun_sb           1.00000000  0.81901539  0.17588136 -0.44732762  0.031892115  0.055307907  0.598724673 -0.602198238
baserun_cs           0.81901539  1.00000000  0.13387505 -0.59171178 -0.017686782 -0.021450144  0.553690108 -0.612892723
pitching_h           0.17588136  0.13387505  1.00000000 -0.14161276  0.320676162  0.268789756  0.667759010  0.039399815
pitching_hr         -0.44732762 -0.59171178 -0.14161276  1.00000000  0.221937505  0.215006676 -0.493144466  0.467400014
pitching_bb          0.03189212 -0.01768678  0.32067616  0.22193750  1.000000000  0.488322635 -0.022837561  0.207786439
pitching_so          0.05530791 -0.02145014  0.26878976  0.21500668  0.488322635  1.000000000 -0.027229749  0.110776318
fielding_e           0.59872467  0.55369011  0.66775901 -0.49314447 -0.022837561 -0.027229749  1.000000000 -0.411305133
fielding_dp         -0.60219824 -0.61289272  0.03939981  0.46740001  0.207786439  0.110776318 -0.411305133  1.000000000
batting_hbp_bi      -0.13506950 -0.21317271 -0.06445004  0.35794984 -0.016906833  0.134963064 -0.185315470  0.104550628
batting_1B           0.34233016  0.35130817  0.40612014 -0.41549520 -0.022820326 -0.327258839  0.547816415 -0.185951668
free_bases_num      -0.34815545 -0.36821249 -0.44800796  0.49652206  0.476195183  0.006845544 -0.665319984  0.344178637
total_bases          0.02325019 -0.14947902 -0.09016127  0.62224360  0.354242024 -0.010818075 -0.236349233  0.217814783
total_bases_allowed  0.09785104  0.02757418  0.97499650  0.05669475  0.459579945  0.350267877  0.557252830  0.139943309
HR_over_OP          -0.19123177 -0.12375950 -0.42822141 -0.17264012 -0.351988418 -0.091755821 -0.353210656  0.021245563
walks_over_OP       -0.31004785 -0.26355189 -0.71949139  0.12897043 -0.704942270 -0.547928892 -0.508313405  0.046155397
SO_over_OP           0.21244403  0.18983408  0.47814645 -0.09881476  0.511518231  0.890681337  0.261695116 -0.007882676
                    batting_hbp_bi  batting_1B free_bases_num total_bases total_bases_allowed  HR_over_OP walks_over_OP
index                  0.047332196 -0.04707442   -0.019063695  0.02517350         0.023268954 -0.00055344  -0.004745951
target_wins            0.002610647  0.21743014    0.228098279  0.48105245        -0.059959123 -0.06099107   0.052184113
batting_h              0.019594018  0.82758476   -0.068377971  0.64141672         0.314205398 -0.32205589  -0.162824365
batting_2b             0.361922796  0.08700989    0.297591911  0.70406098         0.119290484 -0.09945388   0.011599182
batting_3b            -0.265544426  0.60039923   -0.316009005  0.03857762         0.092039617 -0.24335452  -0.231156161
batting_hr             0.392199209 -0.49729485    0.553966941  0.59374244        -0.062551344  0.07455939   0.266798215
batting_bb             0.103058382 -0.35312165    0.991010459  0.54428557        -0.300048525  0.19441460   0.273564933
batting_so             0.396517931 -0.74207113    0.429975162  0.20865100        -0.230842784  0.19623665   0.255336573
baserun_sb            -0.135069502  0.34233016   -0.348155451  0.02325019         0.097851040 -0.19123177  -0.310047852
baserun_cs            -0.213172712  0.35130817   -0.368212490 -0.14947902         0.027574182 -0.12375950  -0.263551885
pitching_h            -0.064450039  0.40612014   -0.448007961 -0.09016127         0.974996503 -0.42822141  -0.719491389
pitching_hr            0.357949841 -0.41549520    0.496522065  0.62224360         0.056694753 -0.17264012   0.128970430
pitching_bb           -0.016906833 -0.02282033    0.476195183  0.35424202         0.459579945 -0.35198842  -0.704942270
pitching_so            0.134963064 -0.32725884    0.006845544 -0.01081807         0.350267877 -0.09175582  -0.547928892
fielding_e            -0.185315470  0.54781641   -0.665319984 -0.23634923         0.557252830 -0.35321066  -0.508313405
fielding_dp            0.104550628 -0.18595167    0.344178637  0.21781478         0.139943309  0.02124556   0.046155397
batting_hbp_bi         1.000000000 -0.23605172    0.231848863  0.29527604        -0.003909755  0.11953125   0.102464739
batting_1B            -0.236051718  1.00000000   -0.376395883  0.17657688         0.318513233 -0.30736736  -0.262024813
free_bases_num         0.231848863 -0.37639588    1.000000000  0.57120065        -0.293643548  0.20565630   0.280775130
total_bases            0.295276040  0.17657688    0.571200648  1.00000000         0.057905416 -0.14529452   0.051960291
total_bases_allowed   -0.003909755  0.31851323   -0.293643548  0.05790542         1.000000000 -0.48106409  -0.750919119
HR_over_OP             0.119531251 -0.30736736    0.205656303 -0.14529452        -0.481064087  1.00000000   0.546339879
walks_over_OP          0.102464739 -0.26202481    0.280775130  0.05196029        -0.750919119  0.54633988   1.000000000
SO_over_OP            -0.050061610  0.01139092   -0.208022326 -0.11652793         0.501731531 -0.19949844  -0.731836247
                      SO_over_OP
index                0.019397099
target_wins         -0.063151948
batting_h           -0.046239500
batting_2b          -0.007736561
batting_3b           0.045697731
batting_hr          -0.149785463
batting_bb          -0.206150098
batting_so          -0.036817410
baserun_sb           0.212444030
baserun_cs           0.189834080
pitching_h           0.478146452
pitching_hr         -0.098814763
pitching_bb          0.511518231
pitching_so          0.890681337
fielding_e           0.261695116
fielding_dp         -0.007882676
batting_hbp_bi      -0.050061610
batting_1B           0.011390918
free_bases_num      -0.208022326
total_bases         -0.116527933
total_bases_allowed  0.501731531
HR_over_OP          -0.199498439
walks_over_OP       -0.731836247
SO_over_OP           1.000000000

Build a Model

Let’s test a model to establish a baseline

str(moneyball_imp)
'data.frame':   2276 obs. of  24 variables:
 $ index              : int  1 2 3 4 5 6 7 8 11 12 ...
 $ target_wins        : int  39 70 86 70 82 75 80 85 86 76 ...
 $ batting_h          : int  1445 1339 1377 1387 1297 1279 1244 1273 1391 1271 ...
 $ batting_2b         : int  194 219 232 209 186 200 179 171 197 213 ...
 $ batting_3b         : int  39 22 35 38 27 36 54 37 40 18 ...
 $ batting_hr         : int  13 190 137 96 102 92 122 115 114 96 ...
 $ batting_bb         : int  143 685 602 451 472 443 525 456 447 441 ...
 $ batting_so         : int  842 1075 917 922 920 973 1062 1027 922 827 ...
 $ baserun_sb         : int  341 37 46 43 49 107 80 40 69 72 ...
 $ baserun_cs         : int  193 28 27 30 39 59 54 36 27 34 ...
 $ pitching_h         : int  9364 1347 1377 1396 1297 1279 1244 1281 1391 1271 ...
 $ pitching_hr        : int  84 191 137 97 102 92 122 116 114 96 ...
 $ pitching_bb        : int  927 689 602 454 472 443 525 459 447 441 ...
 $ pitching_so        : int  5456 1082 917 928 920 973 1062 1033 922 827 ...
 $ fielding_e         : int  1011 193 175 164 138 123 136 112 127 131 ...
 $ fielding_dp        : int  178 155 153 156 168 149 186 136 169 159 ...
 $ batting_hbp_bi     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ batting_1B         : int  1199 908 973 1044 982 951 889 950 1040 944 ...
 $ free_bases_num     : num  143 685 602 451 472 443 525 456 447 441 ...
 $ total_bases        : num  2240 2894 2738 2454 2364 ...
 $ total_bases_allowed: num  10627 2800 2527 2238 2177 ...
 $ HR_over_OP         : int  -71 -1 0 -1 0 0 0 -1 0 0 ...
 $ walks_over_OP      : int  -784 -4 0 -3 0 0 0 -3 0 0 ...
 $ SO_over_OP         : int  4614 7 0 6 0 0 0 6 0 0 ...
base_model_all <- lm(target_wins ~ batting_h + batting_2b + batting_3b + batting_hr + batting_bb + batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + pitching_bb + pitching_so + fielding_e + fielding_dp + batting_hbp + batting_hbp_bi + batting_1B + free_bases_num + total_bases + total_bases_allowed + HR_over_OP + walks_over_OP + SO_over_OP, data = moneyball_imp)
par(mfrow=c(2,2))
plot(base_model_all)

summary(base_model_all)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_3b + 
    batting_hr + batting_bb + batting_so + baserun_sb + baserun_cs + 
    pitching_h + pitching_hr + pitching_bb + pitching_so + fielding_e + 
    fielding_dp + batting_hbp + batting_hbp_bi + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP + 
    walks_over_OP + SO_over_OP, data = moneyball_imp)

Residuals:
     Min       1Q   Median       3Q      Max 
-19.8708  -5.6564  -0.0599   5.2545  22.9274 

Coefficients: (8 not defined because of singularities)
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         60.28826   19.67842   3.064  0.00253 ** 
batting_h            1.91348    2.76139   0.693  0.48927    
batting_2b           0.02639    0.03029   0.871  0.38484    
batting_3b          -0.10118    0.07751  -1.305  0.19348    
batting_hr          -4.84371   10.50851  -0.461  0.64542    
batting_bb          -4.45969    3.63624  -1.226  0.22167    
batting_so           0.34196    2.59876   0.132  0.89546    
baserun_sb           0.03304    0.02867   1.152  0.25071    
baserun_cs          -0.01104    0.07143  -0.155  0.87730    
pitching_h          -1.89096    2.76095  -0.685  0.49432    
pitching_hr          4.93043   10.50664   0.469  0.63946    
pitching_bb          4.51089    3.63372   1.241  0.21612    
pitching_so         -0.37364    2.59705  -0.144  0.88577    
fielding_e          -0.17204    0.04140  -4.155 5.08e-05 ***
fielding_dp         -0.10819    0.03654  -2.961  0.00349 ** 
batting_hbp          0.08247    0.04960   1.663  0.09815 .  
batting_hbp_bi            NA         NA      NA       NA    
batting_1B                NA         NA      NA       NA    
free_bases_num            NA         NA      NA       NA    
total_bases               NA         NA      NA       NA    
total_bases_allowed       NA         NA      NA       NA    
HR_over_OP                NA         NA      NA       NA    
walks_over_OP             NA         NA      NA       NA    
SO_over_OP                NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.467 on 175 degrees of freedom
  (2085 observations deleted due to missingness)
Multiple R-squared:  0.5501,    Adjusted R-squared:  0.5116 
F-statistic: 14.27 on 15 and 175 DF,  p-value: < 2.2e-16
mse <- function(sm) 
  mean(sm$residuals^2)
paste('MSE equal ', mse(base_model_all))
[1] "MSE equal  65.6852879651226"

Though R-squared and adjusted R-square is high, we can clearly see that this model dropping observations. Let’s try to forget about the new additions, and build a model without them.

moneyball_orig <- moneyball_imp[,1:17]
base_model_orig <-
  lm(target_wins ~ batting_h + batting_2b + batting_3b + batting_hr + batting_bb + batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + pitching_bb + pitching_so + fielding_e + fielding_dp, data = moneyball_orig)
  par(mfrow = c(2, 2))
  plot(base_model_orig)

  summary(base_model_orig)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_3b + 
    batting_hr + batting_bb + batting_so + baserun_sb + baserun_cs + 
    pitching_h + pitching_hr + pitching_bb + pitching_so + fielding_e + 
    fielding_dp, data = moneyball_orig)

Residuals:
    Min      1Q  Median      3Q     Max 
-46.207  -8.319   0.073   8.288  53.476 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 35.7456730  5.1101841   6.995 3.48e-12 ***
batting_h    0.0431762  0.0035717  12.088  < 2e-16 ***
batting_2b  -0.0189845  0.0088468  -2.146  0.03199 *  
batting_3b   0.0328935  0.0165608   1.986  0.04713 *  
batting_hr   0.0696203  0.0264228   2.635  0.00847 ** 
batting_bb   0.0120441  0.0055827   2.157  0.03108 *  
batting_so  -0.0158916  0.0024545  -6.474 1.16e-10 ***
baserun_sb   0.0523104  0.0052785   9.910  < 2e-16 ***
baserun_cs  -0.0092764  0.0104461  -0.888  0.37462    
pitching_h   0.0014497  0.0003833   3.782  0.00016 ***
pitching_hr  0.0107640  0.0234254   0.459  0.64592    
pitching_bb -0.0025039  0.0039757  -0.630  0.52889    
pitching_so  0.0014824  0.0008894   1.667  0.09571 .  
fielding_e  -0.0410916  0.0026638 -15.426  < 2e-16 ***
fielding_dp -0.1187948  0.0125159  -9.492  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.59 on 2261 degrees of freedom
Multiple R-squared:  0.3649,    Adjusted R-squared:  0.361 
F-statistic:  92.8 on 14 and 2261 DF,  p-value: < 2.2e-16
  paste('MSE equal ', mse(base_model_orig))
[1] "MSE equal  157.509836420181"

This model looks good, from a performance point of view(r2), but when I look at the variance of the residual I don’t feel secure.
Let’s build another model including lon those with low p-Values.

base_model_lp <-
  lm(target_wins ~ batting_h + batting_2b + batting_hr + batting_bb + batting_so + baserun_sb + pitching_h + pitching_so + fielding_e + fielding_dp, data = moneyball_orig)
  par(mfrow = c(2, 2))
  plot(base_model_lp)

  summary(base_model_lp)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_hr + 
    batting_bb + batting_so + baserun_sb + pitching_h + pitching_so + 
    fielding_e + fielding_dp, data = moneyball_orig)

Residuals:
    Min      1Q  Median      3Q     Max 
-46.974  -8.351   0.134   8.278  52.051 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.2536191  4.9339802   6.942 5.02e-12 ***
batting_h    0.0458731  0.0033148  13.839  < 2e-16 ***
batting_2b  -0.0202481  0.0087922  -2.303  0.02137 *  
batting_hr   0.0769585  0.0088515   8.694  < 2e-16 ***
batting_bb   0.0097206  0.0030308   3.207  0.00136 ** 
batting_so  -0.0160263  0.0023549  -6.806 1.28e-11 ***
baserun_sb   0.0509706  0.0041882  12.170  < 2e-16 ***
pitching_h   0.0013127  0.0003368   3.897  0.00010 ***
pitching_so  0.0010922  0.0006662   1.640  0.10124    
fielding_e  -0.0409861  0.0026598 -15.410  < 2e-16 ***
fielding_dp -0.1191827  0.0123487  -9.651  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.59 on 2265 degrees of freedom
Multiple R-squared:  0.3636,    Adjusted R-squared:  0.3608 
F-statistic: 129.4 on 10 and 2265 DF,  p-value: < 2.2e-16
  paste('MSE equal ', mse(base_model_lp))
[1] "MSE equal  157.839856479662"

Lets remove variables causing multicollinearity using findCorrelation().

to_rm <- colnames(cor(moneyball_imp)[,findCorrelation(cor(moneyball_imp))])
to_rm
[1] "batting_hr"     "free_bases_num" "pitching_h"    
base_model_noCol <-
  lm(target_wins ~ batting_h + batting_2b + batting_bb + batting_so + baserun_sb + pitching_so + fielding_e + fielding_dp, data = moneyball_orig)
  par(mfrow = c(2, 2))
  plot(base_model_noCol)

  summary(base_model_noCol)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_bb + 
    batting_so + baserun_sb + pitching_so + fielding_e + fielding_dp, 
    data = moneyball_orig)

Residuals:
    Min      1Q  Median      3Q     Max 
-47.479  -8.460   0.291   8.567  46.090 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.9921051  4.2605180   2.111 0.034919 *  
batting_h    0.0579261  0.0031329  18.489  < 2e-16 ***
batting_2b  -0.0179472  0.0089646  -2.002 0.045403 *  
batting_bb   0.0157299  0.0029952   5.252 1.65e-07 ***
batting_so  -0.0027274  0.0017083  -1.597 0.110494    
baserun_sb   0.0357246  0.0039159   9.123  < 2e-16 ***
pitching_so  0.0019710  0.0005953   3.311 0.000945 ***
fielding_e  -0.0334027  0.0021195 -15.759  < 2e-16 ***
fielding_dp -0.0923827  0.0120924  -7.640 3.19e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.85 on 2267 degrees of freedom
Multiple R-squared:  0.3371,    Adjusted R-squared:  0.3348 
F-statistic: 144.1 on 8 and 2267 DF,  p-value: < 2.2e-16
  paste('MSE equal ', mse(base_model_noCol))
[1] "MSE equal  164.403742392932"

Though the rsquared value went down, there are some improvements on the Cook’s distance chart. Now let’s try to use use the caret package to apply the transformations we discussed earlier in our exploration phase.

  1. Center and Scale the data
  2. Fix the the problem with outliers by using spatial sign Transformation
  3. Last but not least a boxcox transformation to take car of the skewness
trans <- preProcess(moneyball_imp, method = c("BoxCox"))
transformed <- predict(trans, moneyball_imp)
head(transformed)
      index target_wins batting_h batting_2b batting_3b batting_hr batting_bb
1 0.0000000          39 0.7691708   37.64575         39         13        143
2 0.8921497          70 0.7691645   40.61141         22        190        685
3 1.6538133          86 0.7691669   42.09981         35        137        602
4 2.3414512          70 0.7691675   39.44230         38         96        451
5 2.9788133          82 0.7691617   36.66490         27        102        472
6 3.5787773          75 0.7691604   38.37081         36         92        443
  batting_so baserun_sb baserun_cs pitching_h pitching_hr pitching_bb pitching_so
1        842        341        193  0.5000000          84         927        5456
2       1075         37         28  0.4999997         191         689        1082
3        917         46         27  0.4999997         137         602         917
4        922         43         30  0.4999997          97         454         928
5        920         49         39  0.4999997         102         472         920
6        973        107         59  0.4999997          92         443         973
  fielding_e fielding_dp batting_hbp_bi batting_1B free_bases_num total_bases
1   1.108916    2491.398              0  0.4999997            143    8731.220
2   1.101367    1996.525              0  0.4999994            685   11873.712
3   1.100469    1955.454              0  0.4999995            602   11109.802
4   1.099829    2017.181              0  0.4999995            451    9741.618
5   1.097933    2271.200              0  0.4999995            472    9314.443
6   1.096495    1874.275              0  0.4999994            443    9375.948
  total_bases_allowed HR_over_OP walks_over_OP SO_over_OP
1           0.5263158        -71          -784       4614
2           0.5263156         -1            -4          7
3           0.5263156          0             0          0
4           0.5263156         -1            -3          6
5           0.5263155          0             0          0
6           0.5263155          0             0          0
trans_model_all <-
  lm(target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + pitching_bb + pitching_so + fielding_e + fielding_dp  + batting_1B + free_bases_num + total_bases + total_bases_allowed + HR_over_OP + walks_over_OP + SO_over_OP, data = transformed)
  par(mfrow = c(2, 2))
  plot(trans_model_all)

  summary(trans_model_all)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_3b + 
    batting_bb + batting_so + baserun_sb + baserun_cs + pitching_h + 
    pitching_hr + pitching_bb + pitching_so + fielding_e + fielding_dp + 
    batting_1B + free_bases_num + total_bases + total_bases_allowed + 
    HR_over_OP + walks_over_OP + SO_over_OP, data = transformed)

Residuals:
    Min      1Q  Median      3Q     Max 
-53.068  -7.876  -0.019   8.154  54.708 

Coefficients: (5 not defined because of singularities)
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)         -1.094e+05  1.205e+05  -0.908  0.36383    
batting_h            1.452e+05  1.566e+05   0.927  0.35409    
batting_2b          -3.757e-01  1.154e-01  -3.256  0.00114 ** 
batting_3b           3.242e-02  2.369e-02   1.368  0.17129    
batting_bb           1.433e-01  1.837e-02   7.802 9.27e-15 ***
batting_so          -1.766e-02  2.567e-03  -6.881 7.68e-12 ***
baserun_sb           1.227e-03  9.393e-03   0.131  0.89610    
baserun_cs           9.062e-03  1.056e-02   0.858  0.39074    
pitching_h                  NA         NA      NA       NA    
pitching_hr         -4.568e-02  2.612e-02  -1.749  0.08045 .  
pitching_bb         -7.883e-03  3.459e-03  -2.279  0.02277 *  
pitching_so          3.592e-03  8.920e-04   4.027 5.85e-05 ***
fielding_e          -1.985e+03  1.311e+02 -15.137  < 2e-16 ***
fielding_dp         -6.610e-03  6.394e-04 -10.337  < 2e-16 ***
batting_1B                  NA         NA      NA       NA    
free_bases_num      -1.455e-01  2.066e-02  -7.043 2.49e-12 ***
total_bases          6.612e-03  1.590e-03   4.158 3.33e-05 ***
total_bases_allowed         NA         NA      NA       NA    
HR_over_OP          -4.014e-02  3.440e-02  -1.167  0.24340    
walks_over_OP               NA         NA      NA       NA    
SO_over_OP                  NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 12.67 on 2260 degrees of freedom
Multiple R-squared:  0.3571,    Adjusted R-squared:  0.3529 
F-statistic:  83.7 on 15 and 2260 DF,  p-value: < 2.2e-16
  
  paste('MSE equal ', mse(trans_model_all))
[1] "MSE equal  159.445223911529"
par(mfrow = c(1, 3))
i = 2
while (i %in% c(2:17)) {
 
plot(transformed[,i], transformed$index, xlab = colnames(transformed)[i] , ylab = "Index", main = paste("cleveland dotplot of ",colnames(transformed)[i]))
boxplot(transformed[,i], col = "#A71930", main = paste("Boxplot of ",colnames(transformed)[i]))
hist(
  transformed[,i],
  col = "#A71930",
  xlab = colnames(transformed)[i],
  main = paste("Histogram of ",colnames(transformed)[i])
)
  i = i + 1
}

Looking at Cook’s Distance, it’s clear that we have influential data, but the other charts look right where they should be.

Let’s try, stepwise approach. 1. Both direction

stepwise_base_model_bd <- stepAIC(trans_model_all, direction = "both")
Start:  AIC=-991.49
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP + 
    walks_over_OP + SO_over_OP


Step:  AIC=-991.49
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP + 
    walks_over_OP


Step:  AIC=-991.49
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
- batting_1B           1     0.017 1447.9 -993.47
- baserun_cs           1     0.768 1448.6 -992.29
- pitching_hr          1     0.786 1448.7 -992.26
- baserun_sb           1     0.990 1448.9 -991.94
- batting_3b           1     1.038 1448.9 -991.86
- batting_2b           1     1.228 1449.1 -991.56
<none>                             1447.9 -991.49
- total_bases_allowed  1     1.301 1449.2 -991.45
- batting_h            1     1.672 1449.5 -990.87
- pitching_bb          1     2.362 1450.2 -989.78
- HR_over_OP           1     2.569 1450.4 -989.46
- total_bases          1     6.092 1454.0 -983.94
- pitching_h           1     7.152 1455.0 -982.28
- pitching_so          1    14.666 1462.5 -970.56
- free_bases_num       1    22.046 1469.9 -959.10
- batting_bb           1    26.489 1474.4 -952.23
- batting_so           1    42.066 1489.9 -928.31
- fielding_dp          1    68.348 1516.2 -888.51
- fielding_e           1   120.742 1568.6 -811.19

Step:  AIC=-993.47
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + free_bases_num + 
    total_bases + total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
- baserun_cs           1     0.780 1448.7 -994.24
- pitching_hr          1     0.973 1448.9 -993.94
- baserun_sb           1     1.061 1449.0 -993.80
<none>                             1447.9 -993.47
- batting_3b           1     1.979 1449.9 -992.36
- total_bases_allowed  1     2.038 1449.9 -992.27
- pitching_bb          1     2.378 1450.3 -991.73
+ batting_1B           1     0.017 1447.9 -991.49
- HR_over_OP           1     2.872 1450.8 -990.96
- batting_2b           1     2.911 1450.8 -990.90
- batting_h            1     4.817 1452.7 -987.91
- total_bases          1     6.238 1454.1 -985.68
- pitching_h           1    10.187 1458.1 -979.51
- pitching_so          1    15.166 1463.0 -971.75
- free_bases_num       1    24.275 1472.2 -957.62
- batting_bb           1    29.578 1477.5 -949.44
- batting_so           1    42.122 1490.0 -930.20
- fielding_dp          1    69.730 1517.6 -888.41
- fielding_e           1   121.012 1568.9 -812.78

Step:  AIC=-994.24
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + pitching_h + pitching_hr + pitching_bb + 
    pitching_so + fielding_e + fielding_dp + free_bases_num + 
    total_bases + total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
- pitching_hr          1     0.903 1449.6 -994.82
<none>                             1448.7 -994.24
- total_bases_allowed  1     1.657 1450.3 -993.64
+ baserun_cs           1     0.780 1447.9 -993.47
- baserun_sb           1     2.039 1450.7 -993.04
- batting_3b           1     2.356 1451.0 -992.54
- pitching_bb          1     2.363 1451.0 -992.53
+ batting_1B           1     0.028 1448.6 -992.29
- HR_over_OP           1     2.705 1451.4 -992.00
- batting_2b           1     2.983 1451.7 -991.56
- batting_h            1     4.817 1453.5 -988.69
- total_bases          1     6.114 1454.8 -986.66
- pitching_h           1     9.596 1458.3 -981.22
- pitching_so          1    15.573 1464.2 -971.91
- free_bases_num       1    24.033 1472.7 -958.79
- batting_bb           1    29.558 1478.2 -950.27
- batting_so           1    42.228 1490.9 -930.85
- fielding_dp          1    71.951 1520.6 -885.92
- fielding_e           1   120.362 1569.0 -814.59

Step:  AIC=-994.82
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + pitching_h + pitching_bb + pitching_so + 
    fielding_e + fielding_dp + free_bases_num + total_bases + 
    total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
<none>                             1449.6 -994.82
- total_bases_allowed  1     1.421 1451.0 -994.60
+ pitching_hr          1     0.903 1448.7 -994.24
+ baserun_cs           1     0.709 1448.9 -993.94
- HR_over_OP           1     1.859 1451.4 -993.91
- pitching_bb          1     2.025 1451.6 -993.65
- batting_2b           1     2.113 1451.7 -993.51
+ batting_1B           1     0.160 1449.4 -993.08
- batting_3b           1     7.911 1457.5 -984.44
- pitching_h           1     9.605 1459.2 -981.79
- total_bases          1    11.184 1460.8 -979.33
- baserun_sb           1    14.380 1464.0 -974.36
- pitching_so          1    15.504 1465.1 -972.61
- batting_h            1    15.586 1465.2 -972.48
- free_bases_num       1    23.876 1473.5 -959.64
- batting_bb           1    28.848 1478.4 -951.97
- batting_so           1    51.216 1500.8 -917.80
- fielding_dp          1    71.451 1521.0 -887.32
- fielding_e           1   120.809 1570.4 -814.63
par(mfrow = c(2, 2))
  plot(stepwise_base_model_bd)

  summary(stepwise_base_model_bd)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_3b + 
    batting_bb + batting_so + baserun_sb + pitching_h + pitching_bb + 
    pitching_so + fielding_e + fielding_dp + free_bases_num + 
    total_bases + total_bases_allowed + HR_over_OP, data = transformed)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7578 -0.5110 -0.0044  0.5140  3.2705 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          6.900e-11  1.679e-02   0.000 1.000000    
batting_h            2.724e-01  5.525e-02   4.929 8.85e-07 ***
batting_2b          -5.554e-02  3.060e-02  -1.815 0.069640 .  
batting_3b           1.149e-01  3.272e-02   3.512 0.000453 ***
batting_bb           9.685e-01  1.444e-01   6.706 2.51e-11 ***
batting_so          -3.603e-01  4.032e-02  -8.936  < 2e-16 ***
baserun_sb           1.612e-01  3.404e-02   4.735 2.33e-06 ***
pitching_h          -2.747e-01  7.099e-02  -3.870 0.000112 ***
pitching_bb         -6.647e-02  3.741e-02  -1.777 0.075761 .  
pitching_so          1.534e-01  3.120e-02   4.917 9.44e-07 ***
fielding_e          -5.182e-01  3.776e-02 -13.724  < 2e-16 ***
fielding_dp         -2.421e-01  2.294e-02 -10.554  < 2e-16 ***
free_bases_num      -9.498e-01  1.557e-01  -6.101 1.23e-09 ***
total_bases          3.222e-01  7.715e-02   4.176 3.08e-05 ***
total_bases_allowed  8.224e-02  5.526e-02   1.488 0.136825    
HR_over_OP          -4.188e-02  2.460e-02  -1.702 0.088816 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8009 on 2260 degrees of freedom
Multiple R-squared:  0.3628,    Adjusted R-squared:  0.3586 
F-statistic: 85.79 on 15 and 2260 DF,  p-value: < 2.2e-16
paste('MSE equal ', mse(stepwise_base_model_bd))
[1] "MSE equal  0.636893215447998"
  1. Forward direction
stepwise_base_model_fw <- stepAIC(trans_model_all, direction = "forward")
Start:  AIC=-991.49
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP + 
    walks_over_OP + SO_over_OP
par(mfrow = c(2, 2))
  plot(stepwise_base_model_fw)

  summary(stepwise_base_model_fw)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_3b + 
    batting_bb + batting_so + baserun_sb + baserun_cs + pitching_h + 
    pitching_hr + pitching_bb + pitching_so + fielding_e + fielding_dp + 
    batting_1B + free_bases_num + total_bases + total_bases_allowed + 
    HR_over_OP + walks_over_OP + SO_over_OP, data = transformed)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7240 -0.5110 -0.0064  0.5072  3.2284 

Coefficients: (2 not defined because of singularities)
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          7.167e-11  1.679e-02   0.000 1.000000    
batting_h            2.289e-01  1.418e-01   1.614 0.106559    
batting_2b          -9.290e-02  6.714e-02  -1.384 0.166575    
batting_3b           7.033e-02  5.528e-02   1.272 0.203418    
batting_bb           9.937e-01  1.546e-01   6.426 1.59e-10 ***
batting_so          -3.439e-01  4.247e-02  -8.098 9.05e-16 ***
baserun_sb           7.878e-02  6.342e-02   1.242 0.214247    
baserun_cs           3.853e-02  3.521e-02   1.094 0.273929    
pitching_h          -2.789e-01  8.353e-02  -3.339 0.000854 ***
pitching_hr         -1.385e-01  1.251e-01  -1.107 0.268323    
pitching_bb         -7.248e-02  3.778e-02  -1.919 0.055141 .  
pitching_so          1.511e-01  3.161e-02   4.781 1.85e-06 ***
fielding_e          -5.321e-01  3.878e-02 -13.719  < 2e-16 ***
fielding_dp         -2.401e-01  2.326e-02 -10.322  < 2e-16 ***
batting_1B          -2.097e-02  1.296e-01  -0.162 0.871526    
free_bases_num      -1.036e+00  1.767e-01  -5.862 5.24e-09 ***
total_bases          4.946e-01  1.605e-01   3.082 0.002083 ** 
total_bases_allowed  9.507e-02  6.676e-02   1.424 0.154599    
HR_over_OP          -7.378e-02  3.687e-02  -2.001 0.045487 *  
walks_over_OP               NA         NA      NA       NA    
SO_over_OP                  NA         NA      NA       NA    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8009 on 2257 degrees of freedom
Multiple R-squared:  0.3636,    Adjusted R-squared:  0.3585 
F-statistic: 71.63 on 18 and 2257 DF,  p-value: < 2.2e-16
paste('MSE equal ', mse(stepwise_base_model_fw))
[1] "MSE equal  0.636146688776401"
  1. Backwards direction
stepwise_base_model_bw <- stepAIC(trans_model_all, direction = "backward")
Start:  AIC=-991.49
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP + 
    walks_over_OP + SO_over_OP


Step:  AIC=-991.49
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP + 
    walks_over_OP


Step:  AIC=-991.49
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + batting_1B + 
    free_bases_num + total_bases + total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
- batting_1B           1     0.017 1447.9 -993.47
- baserun_cs           1     0.768 1448.6 -992.29
- pitching_hr          1     0.786 1448.7 -992.26
- baserun_sb           1     0.990 1448.9 -991.94
- batting_3b           1     1.038 1448.9 -991.86
- batting_2b           1     1.228 1449.1 -991.56
<none>                             1447.9 -991.49
- total_bases_allowed  1     1.301 1449.2 -991.45
- batting_h            1     1.672 1449.5 -990.87
- pitching_bb          1     2.362 1450.2 -989.78
- HR_over_OP           1     2.569 1450.4 -989.46
- total_bases          1     6.092 1454.0 -983.94
- pitching_h           1     7.152 1455.0 -982.28
- pitching_so          1    14.666 1462.5 -970.56
- free_bases_num       1    22.046 1469.9 -959.10
- batting_bb           1    26.489 1474.4 -952.23
- batting_so           1    42.066 1489.9 -928.31
- fielding_dp          1    68.348 1516.2 -888.51
- fielding_e           1   120.742 1568.6 -811.19

Step:  AIC=-993.47
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + baserun_cs + pitching_h + pitching_hr + 
    pitching_bb + pitching_so + fielding_e + fielding_dp + free_bases_num + 
    total_bases + total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
- baserun_cs           1     0.780 1448.7 -994.24
- pitching_hr          1     0.973 1448.9 -993.94
- baserun_sb           1     1.061 1449.0 -993.80
<none>                             1447.9 -993.47
- batting_3b           1     1.979 1449.9 -992.36
- total_bases_allowed  1     2.038 1449.9 -992.27
- pitching_bb          1     2.378 1450.3 -991.73
- HR_over_OP           1     2.872 1450.8 -990.96
- batting_2b           1     2.911 1450.8 -990.90
- batting_h            1     4.817 1452.7 -987.91
- total_bases          1     6.238 1454.1 -985.68
- pitching_h           1    10.187 1458.1 -979.51
- pitching_so          1    15.166 1463.0 -971.75
- free_bases_num       1    24.275 1472.2 -957.62
- batting_bb           1    29.578 1477.5 -949.44
- batting_so           1    42.122 1490.0 -930.20
- fielding_dp          1    69.730 1517.6 -888.41
- fielding_e           1   121.012 1568.9 -812.78

Step:  AIC=-994.24
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + pitching_h + pitching_hr + pitching_bb + 
    pitching_so + fielding_e + fielding_dp + free_bases_num + 
    total_bases + total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
- pitching_hr          1     0.903 1449.6 -994.82
<none>                             1448.7 -994.24
- total_bases_allowed  1     1.657 1450.3 -993.64
- baserun_sb           1     2.039 1450.7 -993.04
- batting_3b           1     2.356 1451.0 -992.54
- pitching_bb          1     2.363 1451.0 -992.53
- HR_over_OP           1     2.705 1451.4 -992.00
- batting_2b           1     2.983 1451.7 -991.56
- batting_h            1     4.817 1453.5 -988.69
- total_bases          1     6.114 1454.8 -986.66
- pitching_h           1     9.596 1458.3 -981.22
- pitching_so          1    15.573 1464.2 -971.91
- free_bases_num       1    24.033 1472.7 -958.79
- batting_bb           1    29.558 1478.2 -950.27
- batting_so           1    42.228 1490.9 -930.85
- fielding_dp          1    71.951 1520.6 -885.92
- fielding_e           1   120.362 1569.0 -814.59

Step:  AIC=-994.82
target_wins ~ batting_h + batting_2b + batting_3b + batting_bb + 
    batting_so + baserun_sb + pitching_h + pitching_bb + pitching_so + 
    fielding_e + fielding_dp + free_bases_num + total_bases + 
    total_bases_allowed + HR_over_OP

                      Df Sum of Sq    RSS     AIC
<none>                             1449.6 -994.82
- total_bases_allowed  1     1.421 1451.0 -994.60
- HR_over_OP           1     1.859 1451.4 -993.91
- pitching_bb          1     2.025 1451.6 -993.65
- batting_2b           1     2.113 1451.7 -993.51
- batting_3b           1     7.911 1457.5 -984.44
- pitching_h           1     9.605 1459.2 -981.79
- total_bases          1    11.184 1460.8 -979.33
- baserun_sb           1    14.380 1464.0 -974.36
- pitching_so          1    15.504 1465.1 -972.61
- batting_h            1    15.586 1465.2 -972.48
- free_bases_num       1    23.876 1473.5 -959.64
- batting_bb           1    28.848 1478.4 -951.97
- batting_so           1    51.216 1500.8 -917.80
- fielding_dp          1    71.451 1521.0 -887.32
- fielding_e           1   120.809 1570.4 -814.63
par(mfrow = c(2, 2))
  plot(stepwise_base_model_bw)

  summary(stepwise_base_model_bw)

Call:
lm(formula = target_wins ~ batting_h + batting_2b + batting_3b + 
    batting_bb + batting_so + baserun_sb + pitching_h + pitching_bb + 
    pitching_so + fielding_e + fielding_dp + free_bases_num + 
    total_bases + total_bases_allowed + HR_over_OP, data = transformed)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.7578 -0.5110 -0.0044  0.5140  3.2705 

Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)          6.900e-11  1.679e-02   0.000 1.000000    
batting_h            2.724e-01  5.525e-02   4.929 8.85e-07 ***
batting_2b          -5.554e-02  3.060e-02  -1.815 0.069640 .  
batting_3b           1.149e-01  3.272e-02   3.512 0.000453 ***
batting_bb           9.685e-01  1.444e-01   6.706 2.51e-11 ***
batting_so          -3.603e-01  4.032e-02  -8.936  < 2e-16 ***
baserun_sb           1.612e-01  3.404e-02   4.735 2.33e-06 ***
pitching_h          -2.747e-01  7.099e-02  -3.870 0.000112 ***
pitching_bb         -6.647e-02  3.741e-02  -1.777 0.075761 .  
pitching_so          1.534e-01  3.120e-02   4.917 9.44e-07 ***
fielding_e          -5.182e-01  3.776e-02 -13.724  < 2e-16 ***
fielding_dp         -2.421e-01  2.294e-02 -10.554  < 2e-16 ***
free_bases_num      -9.498e-01  1.557e-01  -6.101 1.23e-09 ***
total_bases          3.222e-01  7.715e-02   4.176 3.08e-05 ***
total_bases_allowed  8.224e-02  5.526e-02   1.488 0.136825    
HR_over_OP          -4.188e-02  2.460e-02  -1.702 0.088816 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8009 on 2260 degrees of freedom
Multiple R-squared:  0.3628,    Adjusted R-squared:  0.3586 
F-statistic: 85.79 on 15 and 2260 DF,  p-value: < 2.2e-16
paste('MSE equal ', mse(stepwise_base_model_bw))
[1] "MSE equal  0.636893215447998"

Conclusion

It definitely made a difference when the transformation were applied. One can see the difference in the residual plots. The residual is now normal(per QQ plot), and there are no patterns when we look at he Rsiduals Vs Fitted plot. When looking at the Rsquared and Adjusted Rsquared together with the residual plots, it’s easy to conclude that the model with the stepwise approach together with the transformations is the one that leads to a better model.

Though RMSE and Rsquared from the other models seem to suggest otherwise, the stepwise model appears to be more stable. I also noticed by looking at the Cook’s Distance plot that there are influncial observations, but for some reason I could not get robust regression to work. From my understanding, robust regression would put less enphasis on those data points, leading to a more accurate model.

LS0tCnRpdGxlOiAiTW9uZXliYWxsIgphdXRob3I6Ci0gSm9yZ2UgRmVybmFuZGVzCi0gTVNEUyA0MTEKZGF0ZTogIjI0IEFwcmlsIDIwMTgiCm91dHB1dDoKICB3b3JkX2RvY3VtZW50OiBkZWZhdWx0CiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0CiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAotLS0KCmBgYHtyIG1lc3NhZ2U9RkFMU0V9CmxpYnJhcnkoZTEwNzEpICMgdG8gdW5kZXJzdGFuZCBza2V3bmVzcwpsaWJyYXJ5KGRwbHlyKQpsaWJyYXJ5KHN0cmluZ3IpICMgVXNlZCB0byByZW5hbWUgdGhlIGNvbHVtbnMgYnkgcmVtb3ZpbmcgdGhlIHdvcmQgdGVhbSBmcm9tIHRoZSBjb2x1bW4gaGVhZGVyCmxpYnJhcnkoVklNKSAjIFRvIHVuZGVyc3RhbmQgTkFzCmxpYnJhcnkoY2FyZXQpCmxpYnJhcnkobWljZSkKbGlicmFyeShNQVNTKSAjIHRvIHVzZSBmb3Igcm9idXN0IExpbmVhciBSZWdyZXNzaW9uLgoKYGBgCgoKYGBge3J9CiMgYnJvd3NlIHRvIHRoZSBkYXRhCm1vbmV5YmFsbCA9IHJlYWQuY3N2KCcvVXNlcnMvbGVnc19qb3JnZS9Eb2N1bWVudHMvRGF0YSBTY2llbmNlIFByb2plY3RzL01TRFNfTm9ydGh3ZXN0ZXJuL01TRFMgNDExL1VuaXQgMDEgTW9uZXliYWxsIEJhc2ViYWxsIFByb2JsZW0vRGF0YS9tb25leWJhbGwuY3N2JywgaGVhZGVyID0gVCkKY29sbmFtZXMobW9uZXliYWxsKSA8LSBzdHJfcmVwbGFjZV9hbGwoY29sbmFtZXMobW9uZXliYWxsKSwiVEVBTV8iLCIiKSAlPiUgCiAgdG9sb3dlcigpICMgRml4aW5nIGNvbHVtbiBuYW1lcwpgYGAKCiMjIEludHJvZHVjdGlvbgoKVGhlIG1vbmV5YmFsbCBkYXRhc2V0IGhhcyBzcGFya2VkIG1hbnkgY29tcGFuaWVzLCB0ZWFtcywgYW5kIG9yZ2FuaXphdGlvbnMgdG8gdW5kZXJzdGFuZCBhbmQgdXRpbGl6ZSB0aGUgZGF0YSB0aGV5IGdlbmVyYXRlL2dhdGhlci4gVGhpcyBwcm9qZWN0IGhpZ2hsaWdodHMgbWFueSBwaXRmYWxscyB0aGF0IHRob3NlIHNhbWUgaW5kaXZpZHVhbHMgZmFsbCBpbnRvIHNpbXBseSBiZWNhdXNlIHRoZXkgZm9yZ290IHRvIGRvIHRoZSBkdWUgZGlsaWdlbmNlIGFuZCBwcmVwYXJlIHRoZSBkYXRhIGJlZm9yZSBtb2RlbGluZy4gIApUaGlzIHBhcGVyIHdpbGwgZm9jdXMgb247ICAKICAxLiBEYXRhIEV4cGxvcmF0aW9uICAKICAyLiBEYXRhIFRyYW5zZm9ybWF0aW9uICAKICAzLiBNb2RlbCBCdWlsZGluZyAgCiAgNC4gSG93IHRvIHNlbGVjdCB0aGUgYmVzdCBtb2RlbCAgCgoKCiMjIERhdGEgRXhwbG9yYXRpb24KCiAKIyMjIFN0ZXAgMTogQ2FuIHdlIGZpbmQgb3V0bGllcnMgaW4gb3VyIEluZGVwZW5kZW50IGFuZCBEZXBlbmRlbnQgdmFyaWFibGVzPyAKCk91dGxpZXJzIGNhbiBjYXVzZSBvdXIgbW9kZWwgdG8gcHJvZHVjZSB0aGUgd3Jvbmcgb3V0cHV0IGJ5IGluZmx1ZW5jaW5nIGl0cyBmaXQuIApDcmVhdGluZyBib3hwbG90cyB3aWxsIGFpZCBpbiBpZGVudGlmeWluZyB0aG9zZSBvdXRsaWVycy4KV2UgY2FuIGFsc28gdXNlIHRoZSBjbGV2ZWxhbmQgZG90cGxvdCB0byB1bmRlcnN0YW5kIHRoZSBvdXRsaWVycyBiZXR0ZXIuIFRoaXMgdGVjaG5pcXVlIHVzZXMgdGhlIHJvdyBudW1iZXIgYWdhaW5zdCBhY3R1YWwgdmFsdWUgdG8gcXVpY2tseSBwb2ludCBvdXQgYW55IHBhdHRlcm5zIG9mIG91dGxpZXJzLiBUaGlzIHBsb3Qgd2lsbCBlYXNpbGx5IGFsbG93IHVzIHRvIGNoZWNrIHRoZSByYXcgZGF0YSBmb3IgZXJyb3JzIHN1Y2ggYXMgdHlwb3MgZHVyaW5nIHRoZSBkYXRhIGNvbGxlY3Rpb24gcGhhc2UuIFBvaW50cyBvbiB0aGUgZmFyIHJpZ2h0IHNpZGUsIG9yIG9uIHRoZSBmYXIgbGVmdCBzaWRlLCBhcmUgb2JzZXJ2ZWQgdmFsdWVzIHRoYXQgYXJlIGNvbnNpZGVyYWJseSBsYXJnZXIsIG9yIHNtYWxsZXIsIHRoYW4gdGhlIG1ham9yaXR5IG9mIHRoZSBvYnNlcnZhdGlvbnMsIGFuZCByZXF1aXJlIGZ1cnRoZXIgaW52ZXN0aWdhdGlvbi4gV2hlbiB3ZSB1c2UgdGhpcyBjaGFydCwgdG9nZXRoZXIgd2l0aCB0aGUgYm94IHBsb3QgYW5kIGhpc3RvZ3JhbSwgd2UgY2FuIGVhc2lseSBpZGVudGlmeSBwYXR0ZXJucyBhdCB0byB3aGVyZSBpbiB0aGUgZGF0YSB3ZSdyZSBzZWVpbmcgb3V0bGllcnMuCgpgYGB7cn0KcGFyKG1mcm93ID0gYygxLCAzKSkKaSA9IDIKd2hpbGUgKGkgJWluJSBjKDI6MTcpKSB7CiAKcGxvdChtb25leWJhbGxbLGldLCBtb25leWJhbGwkaW5kZXgsIHhsYWIgPSBjb2xuYW1lcyhtb25leWJhbGwpW2ldICwgeWxhYiA9ICJJbmRleCIsIG1haW4gPSBwYXN0ZSgiY2xldmVsYW5kIGRvdHBsb3Qgb2YgIixjb2xuYW1lcyhtb25leWJhbGwpW2ldKSkKCmJveHBsb3QobW9uZXliYWxsWyxpXSwgY29sID0gIiNBNzE5MzAiLCBtYWluID0gcGFzdGUoIkJveHBsb3Qgb2YgIixjb2xuYW1lcyhtb25leWJhbGwpW2ldKSkKCmhpc3QoCiAgbW9uZXliYWxsWyxpXSwKICBjb2wgPSAiI0E3MTkzMCIsCiAgeGxhYiA9IGNvbG5hbWVzKG1vbmV5YmFsbClbaV0sCiAgbWFpbiA9IHBhc3RlKCJIaXN0b2dyYW0gb2YgIixjb2xuYW1lcyhtb25leWJhbGwpW2ldKQopCiAgaSA9IGkgKyAxCn0KCmBgYAoKCgpJdCBsb29rcyBsaWtlIHRoZSBvdXRsaWVycyBhcmUgbGVnaXRtYXRlIGFuZCB3ZSB3aWxsIHRyeSBTcGF0aWFsIFNpZ24gdHJhbnNmb3JtYXRpb24gdG8gZGVhbCB3aXRoIHRoZW0uCgpOb3cgdGhhdCBzdGVwIG9uZSBpcyBkb25lLCBsZXQncyBsb29rIGF0IHN0ZXAgMi4KCiMjIyBTdGVwIDI6IEFyZSB0aGUgZGF0YSBub3JtYWxseSBkaXN0cmlidXRlZD8KCkZyb20gdGhlIGhpc3RvcmdyYW0gYWJvdmUgd2UgY2FuIGNsZWFybHkgc2VlIHRoYXQgdGhlIGRhdGEgaXMgbm90IG5vcm1hbCwgd2l0aCB0aGUgZXhjZXB0aW9uIG9mIHNvbWUgdGhhdCBzZWVtcyB0byBzb3J0IG9mIGZvbGxvdyBhIG5vcm1hbCBkaXN0cmlidXRpb24uCkxldCdzIHVzZSBRUS1wbG90IHRvIHRlc3QgZWFjaCBjb2x1bW4gZm9yIG5vcm1hbGl0eSwgd2hpbGUgYWRkaW5nIGEgaGlzdG9ncmFtIGFuZCBhIFNrZXduZXNzIG51bWJlci4gICAKIC0gSWYgc2tld25lc3MgaXMgbGVzcyB0aGFuIOKIkjEgb3IgZ3JlYXRlciB0aGFuICsxLCB0aGUgZGlzdHJpYnV0aW9uIGlzIGhpZ2hseSBza2V3ZWQuICAKIC0gSWYgc2tld25lc3MgaXMgYmV0d2VlbiDiiJIxIGFuZCDiiJLCvSBvciBiZXR3ZWVuICvCvSBhbmQgKzEsIHRoZSBkaXN0cmlidXRpb24gaXMgbW9kZXJhdGVseSBza2V3ZWQuICAKIC0gSWYgc2tld25lc3MgaXMgYmV0d2VlbiDiiJLCvSBhbmQgK8K9LCB0aGUgZGlzdHJpYnV0aW9uIGlzIGFwcHJveGltYXRlbHkgc3ltbWV0cmljLiAgCmBgYHtyfQpwYXIobWZyb3cgPSBjKDIsIDIpKQppID0gMgp3aGlsZSAoaSAlaW4lIGMoMjoxNykpIHsKICBxcW5vcm0obW9uZXliYWxsWyxpXSwgbWFpbiA9IHBhc3RlKCJRUS1QbG90IG9mICIsY29sbmFtZXMobW9uZXliYWxsKVtpXSkpO3FxbGluZShtb25leWJhbGxbLGldLCBjb2wgPSAyKQogIAogIGhpc3QoCiAgbW9uZXliYWxsWyxpXSwKICBjb2wgPSAiI0E3MTkzMCIsCiAgeGxhYiA9IGNvbG5hbWVzKG1vbmV5YmFsbClbaV0sCiAgbWFpbiA9IHBhc3RlMCgiU2tld25lc3MgPSAiLHNrZXduZXNzKG1vbmV5YmFsbFssaV0pKQopCiAgCiAgaSA9IGkgKyAxCiAgCn0KCmBgYAoKV2Ugd291bGQgbmVlZCB0byB0cnkgY2VydGFpbiB0cmFuc2Zvcm1hdGlvbiB0byBjb3JyZWN0IGZvciBTa2V3bmVzcywgd2l0aCBCb3gtQ294IGJlaW5nIHRoZSBudW1iZXIgb25lIGNob2ljZS4KCiMjIyBTdGVwIDM6IEFyZSB0aGVyZSBsb3RzIG9mIE5BcyBpbiB0aGUgZGF0YT8KClIgZ2l2ZXMgdXMgYSBsb3Qgb2Ygd2F5cyB0byB1bmRlcnN0YW5kIHRoZSBkaXN0cmlidXRpb24gb2YgYE51bGxzYCB3aXRoaW4gdGhlIGRhdGEuIExldCdzIGZpcnN0IHRyeSB0byBjYWxjdWxhdGUgdGhlIHBlcmNlbnRhZ2Ugb2YgTnVsbCB2YWx1ZXMgdG8gdGhlIHRvdGFsIG51bWJlciBvZiBvYnNlcnZhdGlvbi4KYGBge3J9Ck5BUGVyYyA8LQogIHNhcHBseShtb25leWJhbGwsIGZ1bmN0aW9uKHgpCiAgICAoc3VtKGlzLm5hKHgpKSAvIGxlbmd0aCh4KSkgKiAxMDApICU+JQogIGRhdGEuZnJhbWUoKQpOQVBlcmMkQ29sdW1uIDwtIHJvd25hbWVzKE5BUGVyYykKY29sbmFtZXMoTkFQZXJjKSA8LSBjKCJOQV9QZXJjIiwgIkNvbF9OYW1lIikKCiMgVHJ5aW5nIHRvIHVuZGVyc3RhbmQgdGhlIHBlcmNlbnRhZ2Ugb2YgTkFzIHBlciBDb2x1bW4KTkFfY29sIDwtIHN1YnNldChOQVBlcmMsIE5BX1BlcmMgPiAwKSAlPiUgYXJyYW5nZShkZXNjKE5BX1BlcmMpKQpOQV9jb2wKYGBgCgpMZXQncyBsb29rIGF0IHRoZSBwYXR0ZXJuIG9mIG1pc3NpbmcgZGF0YSB0byB0cnkgdG8gZ2V0IG1vcmUgaW5zaWdodHMuIEl0J3MgY2xlYXIgdGhhdCBiYXR0aW5nX2hicCBpcyBnb2luZyB0byBiZSBhIHByb2JsZW1hdGljIGNvbHVtbiB3aXRoIDkyJSBvZiB0aGUgZGF0YSBtaXNzaW5nLgpCZWZvcmUgd2Ugc3RhcnQgdGhlIGltcHV0YXRpb24gb3IgZGVsZXRpbmcgdmFyaWFibGVzLCBsZXQncyB0cnkgdG8gdW5kZXJzdGFuZCB3aHkgd2UgaGF2ZSBtaXNzaW5nIGRhdGEuIAoKTGV0J3MgdXNlIHRoZSBgbWljZWAgcGFja2FnZSB0byBoZWxwIHVzIHVuZGVyc3RhbnQgaG93IGFsbCB0aGUgTkFzIGJlaGF2ZSBpbiB0aGUgZGF0YS4gYG1pY2VgIHByb3ZpZGVzIGEgaGFuZHkgZnVuY3Rpb24gY2FsbGVkIGBtZC5wYXR0ZXJuYCB0aGF0IGFsbG93cyBvbmUgdG8gdW5kZXJzdGFuZCB0aGUgcGF0dGVybiBvZiBtaXNzaW5nIGRhdGEuIEhvcGVmdWxseSBieSBsb29raW5nIGF0IHRoZSBwYXR0ZXJuLCB3ZSBjYW4gaGF2ZSBhbiBpZGVhIG9uIHdoeSB0aGUgZGF0YSBjb3VsZCBiZSBtaXNzaW5nLgpgYGB7cn0KbWQucGF0dGVybihtb25leWJhbGwpICU+JSBkYXRhLmZyYW1lKCkKYGBgCgpUaGUgKipmaXJzdCBjb2x1bW4qKiBvZiB0aGUgb3V0cHV0IHNob3dzIHRoZSBudW1iZXIgb2YgdW5pcXVlIG1pc3NpbmcgZGF0YSBwYXR0ZXJucy4gVGhlcmUgYXJlIDE5MSBvYnNlcnZhdGlvbnMgd2l0aCBub25taXNzaW5nIHZhbHVlcywgYW5kIHRoZXJlIGFyZSAxMjk1IG9ic2VydmF0aW9ucyB3aXRoIG5vbm1pc3NpbmcgdmFsdWVzIGV4Y2VwdCBmb3IgdGhlIHZhcmlhYmxlIGJhdHRpbmdfaGJwLiBUaGUgKipyaWdodG1vc3QgY29sdW1uKiogc2hvd3MgdGhlIG51bWJlciBvZiAqbWlzc2luZyB2YXJpYWJsZXMqIGluIGEgcGFydGljdWxhciBtaXNzaW5nIHBhdHRlcm4uIEZvciBleGFtcGxlLCB0aGUgZmlyc3Qgcm93IGhhcyBubyBtaXNzaW5nIHZhbHVlIGFuZCBpdCBpcyDigJww4oCdIGluIHRoZSByb3cuIFRoZSAqKmxhc3Qgcm93KiogY291bnRzIHRoZSBudW1iZXIgb2YgbWlzc2luZyB2YWx1ZXMgZm9yIGVhY2ggdmFyaWFibGUuIEZvciBleGFtcGxlLCB0aGUgdmFyaWFibGUgcGl0Y2hpbmdfYmIgY29udGFpbnMgbm8gbWlzc2luZyB2YWx1ZXMgYW5kIHRoZSB2YXJpYWJsZSBiYXR0aW5nX3NvIGNvbnRhaW5zIDEwMiBtaXNzaW5nIHZhbHVlcy4gVGhpcyB0YWJsZSBjYW4gYmUgaGVscGZ1bCB3aGVuIHlvdSBkZWNpZGUgdG8gZHJvcCBzb21lIG9ic2VydmF0aW9ucyB3aXRoIG1pc3NpbmcgdmFyaWFibGVzIGV4Y2VlZGluZyBhIHByZXNldCB0aHJlc2hvbGQuCgpBZnRlciBjYXJlZnVsIGFuYWx5c2lzLCB0aGUgZGVjaXNpb24gaXMgdG8ga2VlcCBgYmF0dGluZ19oYnBgLiBCZWNhdXNlIEkgd2FudCB0byB0cmFuc2Zvcm0gaXQgaW50byBhIGJpbmFyeSB2YXJpYWJsZSwgSSB3aWxsIGtlZXAgaXQgb3V0IHVudGlsIGFsbCB0aCBlaW1wdXRhdGlvbiBpcyBkb25lLgoKYGBge3J9CmJhdHRpbmdfaGJwX2JpIDwtIGlmX2Vsc2UoaXMubmEobW9uZXliYWxsJGJhdHRpbmdfaGJwKSwwLDEpCmJhdHRpbmdfaGJwIDwtIG1vbmV5YmFsbCRiYXR0aW5nX2hicAptb25leWJhbGxfdHJhbnMgPC0gc3Vic2V0KG1vbmV5YmFsbCwgc2VsZWN0ID0gLWMoYmF0dGluZ19oYnApKQpgYGAKCgpMZXQncyBpbXB1dGUgYW5kIHRyZWF0IHRoZSBkYXRhIGZvciBtaXNzaW5nIHZhbHVlcyBiZWZvcmUgdGVzdGluZyBpdCBmb3IgbXVsdGljb2xsaW5lYXJpdHkuCgpUaGUgYG1pY2VgIHBhY2thZ2Ugd2lsbCBiZSB0aGUgcGFja2FnZSB1c2VkIHRvIGhlbHAgdXMgd2l0aCB0aGlzIHRhc2suIFNpbmNlIHdlIG9ubHkgaGF2ZSBudW1lcmljIHZhbHVlcywgbWljZSB3aWxsIGF1dG9tYXRpY2FsbHkgY2hvc2UgUE1NIChQcmVkaWN0aXZlIE1lYW4gTWF0Y2hpbmcpIGFzIHRoZSBtZXRob2QuIEEgZ3JlYXQgcmVzb3VyY2UgdG8gdW5kZXJzdGFuZCB0aGlzIHRlY2hpbmlxdWUgaXMgZm91bmQgW2hlcmVdKGh0dHBzOi8vc3RhdGlzdGljYWxob3Jpem9ucy5jb20vcHJlZGljdGl2ZS1tZWFuLW1hdGNoaW5nKS4KCgpgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFLCBpbmNsdWRlPUZBTFNFLCBwYWdlZC5wcmludD1GQUxTRX0KbWljZV9pbXB1dGVzIDwtIG1pY2UobW9uZXliYWxsX3RyYW5zLCBtID0gMTAsIG1heGl0ID0gNDApCiNXaGF0IG1ldGhvZHMgd2VyZSB1c2VkIGZvciBpbXB1dGluZwptZXRob2QgPC0gbWljZV9pbXB1dGVzJG1ldGhvZAojIEkgb25seSBoYXZlIG51bWVyaWMgdmFsdWVzLCBtaWNlIGNob3NlIFBNTSAoUHJlZGljdGl2ZSBNZWFuIE1hdGNoaW5nKQoKI0ltcHV0ZWQgZGF0YXNldAptb25leWJhbGxfaW1wIDwtIGNvbXBsZXRlKG1pY2VfaW1wdXRlcywgMTApCmBgYAoKCkxldCdzIGFkZCBgYmF0dGluZ19oYnBgIGJhY2sgaW50byB0aGUgZGF0YS4KCmBgYHtyfQptb25leWJhbGxfaW1wJGJhdHRpbmdfaGJwIDwtIGJhdHRpbmdfaGJwCm1vbmV5YmFsbF9pbXAkYmF0dGluZ19oYnBfYmkgPC0gYmF0dGluZ19oYnBfYmkKCgpgYGAKCgojIyMgU3RlcCA0OiBJcyB0aGVyZSBjb2xsaW5lYXJpdHkgYW1vbmcgdGhlIGNvdmFyaWF0ZXM/CgpMZXQncyBjcmVhdGUgYSBzZXJpZXMgb2YgY29ycmVsYXRpb24gbWF0aXggdG8gdW5kZXJzdGFuZCBob3cgZWFjaCBpbmRlcGVuZGVudCB2YXJpYWJsZSBpbnRlcmFjdHMgd2l0aCB0aGUgZGVwZW5kZW50IHZhcmlhYmxlLiBUaGlzIGNvcnJlbGF0aW9uIG1hdGl4IHdpbGwgaGVscCB1cyBzcG90IGFueSBpbmZyaWdlbWVudCBvZiB0aGUgYXNzdXBtdGlvbnMgbmVlZGVkIHRvIGRldmVsb3AgYSByb2J1c3QgT0xTIG1vZGVsLCBuYW1lbHkgbXVsdGljb2xsaW5lYXJpdHkuIFRoZSBgY2FyZXRgIHBhY2thZ2UgY2FuIGhlbHAgdGhlIHVzZXIgZmluZCB0aG9zZSBwYWlycyBhbmQgZXZlbiBzdWdnZXN0IHdoaWNoIG9uZSB0byByZW1vdmUuCgpUaGUgQ2FyZXQgcGFja2FnZSBvZmZlcnMgdGhlIGBmaW5kY29ycmVsYXRpb24oKWAsIHdoaWNoIHRha2VzIHRoZSBjb3JyZWxhdGlvbiBtYXRyaXggYXMgYW4gaW5wdXQgYW5kIGZpbmRzIHRoZSBmaWVsZHMgY2F1c2luZyBtdWx0aWNvbGxpbmVhcml0eSBiYXNlZCBvbiBhIHRocmVzaG9sZCwgdGhlIGBjdXRvZmZgIHBhcmFtZXRlci4gSXQgaW4gdHVybnMgcmV0dXJucyBhIHZlY3RvciB3aXRoIHZhbHVlcyB0aGF0IHdvdWxkIG5lZWQgdG8gYmUgcmVtb3ZlZCBmcm9tIG91ciBkYXRhc2V0IGR1ZSB0byBjb3JyZWxhdGlvbi4gCmBgYHtyfQpjb2xuYW1lcyhtb25leWJhbGxfaW1wKVtmaW5kQ29ycmVsYXRpb24oY29yKG1vbmV5YmFsbF9pbXApKV0KYGBgCgojIyBEYXRhIFRyYW5zZm9ybWF0aW9uCgpMZXQncyBpbnRyb2R1Y2UgbmV3IHZhcmlhYmxlcyB0aHJvdWdoIHRyYW5zZm9ybWF0aW9uOgoKMS4gYGJhdHRpbmdfMUIgPSBiYXR0aW5nX2gtKGJhdHRpbmdfMmIgKyBiYXR0aW5nXzNiICsgYmF0dGluZ19ocilgCjIuIGBmcmVlX2Jhc2VzX251bSA9IGJhdHRpbmdfaGJwICsgYmF0dGluZ19iYmAKMy4gYHRvdGFsX2Jhc2VzID0gYmF0dGluZ18xQiArIDIgKiBiYXR0aW5nXzJiICsgMyAqIGJhdHRpbmdfM2IgKyA0ICogYmF0dGluZ19ociArIGJhdHRpbmdfYmIgKyBiYXR0aW5nX2hicCArIGJhc2VydW5fc2JgCjQuIGB0b3RhbF9iYXNlc19hbGxvd2VkID0gcGl0Y2hpbmdfYmIgKyA0ICogcGl0Y2hpbmdfaHIgKyBwaXRjaGluZ19oYAo1LiBgSFJfb3Zlcl9PUCA9IGJhdHRpbmdfaHIgLSBwaXRjaGluZ19ocmAKNi4gYHdhbGtzX292ZXJfT1AgPSBiYXR0aW5nX2JiIC0gcGl0Y2hpbmdfYmJgCjcuIGBTT19vdmVyX09QID0gcGl0Y2hpbmdfc28gLSBiYXR0aW5nX3NvYAoKYGBge3J9Cm1vbmV5YmFsbF9pbXAkYmF0dGluZ18xQiA8LSBtb25leWJhbGxfaW1wJGJhdHRpbmdfaC0obW9uZXliYWxsX2ltcCRiYXR0aW5nXzJiICsgbW9uZXliYWxsX2ltcCRiYXR0aW5nXzNiICsgbW9uZXliYWxsX2ltcCRiYXR0aW5nX2hyKQptb25leWJhbGxfaW1wJGZyZWVfYmFzZXNfbnVtIDwtICBpZl9lbHNlKGlzLm5hKG1vbmV5YmFsbF9pbXAkYmF0dGluZ19oYnApLDAsYXMubnVtZXJpYyhtb25leWJhbGxfaW1wJGJhdHRpbmdfaGJwKSkgKyBtb25leWJhbGxfaW1wJGJhdHRpbmdfYmIKbW9uZXliYWxsX2ltcCR0b3RhbF9iYXNlcyA8LSBtb25leWJhbGxfaW1wJGJhdHRpbmdfMUIgKyAyICogbW9uZXliYWxsX2ltcCRiYXR0aW5nXzJiICsgMyAqIG1vbmV5YmFsbF9pbXAkYmF0dGluZ18zYiArIDQgKiBtb25leWJhbGxfaW1wJGJhdHRpbmdfaHIgKyBtb25leWJhbGxfaW1wJGJhdHRpbmdfYmIgKyBpZl9lbHNlKGlzLm5hKG1vbmV5YmFsbF9pbXAkYmF0dGluZ19oYnApLDAsYXMubnVtZXJpYyhtb25leWJhbGxfaW1wJGJhdHRpbmdfaGJwKSkgKyBtb25leWJhbGxfaW1wJGJhc2VydW5fc2IKbW9uZXliYWxsX2ltcCR0b3RhbF9iYXNlc19hbGxvd2VkID0gbW9uZXliYWxsX2ltcCRwaXRjaGluZ19iYiArIDQgKiBtb25leWJhbGxfaW1wJHBpdGNoaW5nX2hyICsgbW9uZXliYWxsX2ltcCRwaXRjaGluZ19oCm1vbmV5YmFsbF9pbXAkSFJfb3Zlcl9PUCA9IG1vbmV5YmFsbF9pbXAkYmF0dGluZ19ociAtIG1vbmV5YmFsbF9pbXAkcGl0Y2hpbmdfaHIKbW9uZXliYWxsX2ltcCR3YWxrc19vdmVyX09QID0gbW9uZXliYWxsX2ltcCRiYXR0aW5nX2JiIC0gbW9uZXliYWxsX2ltcCRwaXRjaGluZ19iYgptb25leWJhbGxfaW1wJFNPX292ZXJfT1AgPSBtb25leWJhbGxfaW1wJHBpdGNoaW5nX3NvIC0gbW9uZXliYWxsX2ltcCRiYXR0aW5nX3NvCiMgbWFrZSBhbGlzdCBvZiBwcmVkaWN0b3JzIGFuZCBmb3JtYXQgdGhlbQoKY29sbmFtZXMobW9uZXliYWxsX2ltcCkKcHJlZF9saXN0IDwtCiAgImluZGV4ICsgdGFyZ2V0X3dpbnMgKyBiYXR0aW5nX2ggKyBiYXR0aW5nXzJiICsgYmF0dGluZ18zYiArIGJhdHRpbmdfaHIgKwpiYXR0aW5nX2JiICsgYmF0dGluZ19zbyArIGJhc2VydW5fc2IgKyBiYXNlcnVuX2NzICsgcGl0Y2hpbmdfaCArIHBpdGNoaW5nX2hyICsKcGl0Y2hpbmdfYmIgKyBwaXRjaGluZ19zbyArIGZpZWxkaW5nX2UgKyBmaWVsZGluZ19kcCArIGJhdHRpbmdfaGJwICsgYmF0dGluZ19oYnBfYmkgKwpiYXR0aW5nXzFCICsgZnJlZV9iYXNlc19udW0gKyB0b3RhbF9iYXNlcyArIHRvdGFsX2Jhc2VzX2FsbG93ZWQgKyBIUl9vdmVyX09QICsgd2Fsa3Nfb3Zlcl9PUCArIFNPX292ZXJfT1AiCiNrZWVwIHRoZSBuZXcgdmFyaWFibGVzIGluIGEgdmVjdG9yIGZvciB0ZXh0aW5nIGxhdGVyLCBpbiBjYWUgdGhleSBkb24ndCBwcm92ZSB0byBiZSBvZiBhbnkgdmFsdWUuCm5ld192YXIgPC0gYygiYmF0dGluZ18xQiIsImZyZWVfYmFzZXNfbnVtIiwidG90YWxfYmFzZXMiLCJ0b3RhbF9iYXNlc19hbGxvd2VkIiwiSFJfb3Zlcl9PUCIsIndhbGtzX292ZXJfT1AiLCJTT19vdmVyX09QIikKYGBgCgpOb3cgdGhhdCB3ZSBoYXZlIGltcHV0ZWQgYW5kIGNyZWF0ZWQgbmV3IHZhcmlhYmxlcywgbGV0J3MgbG9vayBhdCB0aGUgY29ycmVsYXRpb24gbWF0cml4IHRvIHVuZGVyc3RhbmQgdGhlIGNvcnJlbGF0aW9uIGJldHdlZW4gdGhlIHZhcmlhYmxlcyBhbmQgdGhlIHRyYWdldF93aW5zCgpgYGB7cn0KbW9uZXliYWxsX2ltcCA8LSBzdWJzZXQobW9uZXliYWxsX2ltcCwgc2VsZWN0ID0gLWMoYmF0dGluZ19oYnApKQpjb3IobW9uZXliYWxsX2ltcCkKYGBgCgojIyBCdWlsZCBhIE1vZGVsCgpMZXQncyB0ZXN0IGEgbW9kZWwgdG8gZXN0YWJsaXNoIGEgYmFzZWxpbmUKCmBgYHtyIH0Kc3RyKG1vbmV5YmFsbF9pbXApCmJhc2VfbW9kZWxfYWxsIDwtIGxtKHRhcmdldF93aW5zIH4gYmF0dGluZ19oICsgYmF0dGluZ18yYiArIGJhdHRpbmdfM2IgKyBiYXR0aW5nX2hyICsgYmF0dGluZ19iYiArIGJhdHRpbmdfc28gKyBiYXNlcnVuX3NiICsgYmFzZXJ1bl9jcyArIHBpdGNoaW5nX2ggKyBwaXRjaGluZ19ociArIHBpdGNoaW5nX2JiICsgcGl0Y2hpbmdfc28gKyBmaWVsZGluZ19lICsgZmllbGRpbmdfZHAgKyBiYXR0aW5nX2hicCArIGJhdHRpbmdfaGJwX2JpICsgYmF0dGluZ18xQiArIGZyZWVfYmFzZXNfbnVtICsgdG90YWxfYmFzZXMgKyB0b3RhbF9iYXNlc19hbGxvd2VkICsgSFJfb3Zlcl9PUCArIHdhbGtzX292ZXJfT1AgKyBTT19vdmVyX09QLCBkYXRhID0gbW9uZXliYWxsX2ltcCkKcGFyKG1mcm93PWMoMiwyKSkKcGxvdChiYXNlX21vZGVsX2FsbCkKc3VtbWFyeShiYXNlX21vZGVsX2FsbCkKbXNlIDwtIGZ1bmN0aW9uKHNtKSAKICBtZWFuKHNtJHJlc2lkdWFsc14yKQoKcGFzdGUoJ01TRSBlcXVhbCAnLCBtc2UoYmFzZV9tb2RlbF9hbGwpKQpgYGAKVGhvdWdoIFItc3F1YXJlZCBhbmQgYWRqdXN0ZWQgUi1zcXVhcmUgaXMgaGlnaCwgd2UgY2FuIGNsZWFybHkgc2VlIHRoYXQgdGhpcyBtb2RlbCBkcm9wcGluZyBvYnNlcnZhdGlvbnMuIExldCdzIHRyeSB0byBmb3JnZXQgYWJvdXQgdGhlIG5ldyBhZGRpdGlvbnMsIGFuZCBidWlsZCBhIG1vZGVsIHdpdGhvdXQgdGhlbS4KCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0UsIHBhZ2VkLnByaW50PUZBTFNFfQptb25leWJhbGxfb3JpZyA8LSBtb25leWJhbGxfaW1wWywxOjE3XQpiYXNlX21vZGVsX29yaWcgPC0KICBsbSh0YXJnZXRfd2lucyB+IGJhdHRpbmdfaCArIGJhdHRpbmdfMmIgKyBiYXR0aW5nXzNiICsgYmF0dGluZ19ociArIGJhdHRpbmdfYmIgKyBiYXR0aW5nX3NvICsgYmFzZXJ1bl9zYiArIGJhc2VydW5fY3MgKyBwaXRjaGluZ19oICsgcGl0Y2hpbmdfaHIgKyBwaXRjaGluZ19iYiArIHBpdGNoaW5nX3NvICsgZmllbGRpbmdfZSArIGZpZWxkaW5nX2RwLCBkYXRhID0gbW9uZXliYWxsX29yaWcpCiAgcGFyKG1mcm93ID0gYygyLCAyKSkKICBwbG90KGJhc2VfbW9kZWxfb3JpZykKICBzdW1tYXJ5KGJhc2VfbW9kZWxfb3JpZykKICBwYXN0ZSgnTVNFIGVxdWFsICcsIG1zZShiYXNlX21vZGVsX29yaWcpKQpgYGAKVGhpcyBtb2RlbCBsb29rcyBnb29kLCBmcm9tIGEgcGVyZm9ybWFuY2UgcG9pbnQgb2YgdmlldyhyMiksIGJ1dCB3aGVuIEkgbG9vayBhdCB0aGUgdmFyaWFuY2Ugb2YgdGhlIHJlc2lkdWFsIEkgZG9uJ3QgZmVlbCBzZWN1cmUuICAKTGV0J3MgYnVpbGQgYW5vdGhlciBtb2RlbCBpbmNsdWRpbmcgbG9uIHRob3NlIHdpdGggbG93IHAtVmFsdWVzLgoKYGBge3J9CmJhc2VfbW9kZWxfbHAgPC0KICBsbSh0YXJnZXRfd2lucyB+IGJhdHRpbmdfaCArIGJhdHRpbmdfMmIgKyBiYXR0aW5nX2hyICsgYmF0dGluZ19iYiArIGJhdHRpbmdfc28gKyBiYXNlcnVuX3NiICsgcGl0Y2hpbmdfaCArIHBpdGNoaW5nX3NvICsgZmllbGRpbmdfZSArIGZpZWxkaW5nX2RwLCBkYXRhID0gbW9uZXliYWxsX29yaWcpCiAgcGFyKG1mcm93ID0gYygyLCAyKSkKICBwbG90KGJhc2VfbW9kZWxfbHApCiAgc3VtbWFyeShiYXNlX21vZGVsX2xwKQogIHBhc3RlKCdNU0UgZXF1YWwgJywgbXNlKGJhc2VfbW9kZWxfbHApKQpgYGAKCkxldHMgcmVtb3ZlIHZhcmlhYmxlcyBjYXVzaW5nIG11bHRpY29sbGluZWFyaXR5IHVzaW5nIGBmaW5kQ29ycmVsYXRpb24oKWAuCmBgYHtyfQp0b19ybSA8LSBjb2xuYW1lcyhjb3IobW9uZXliYWxsX2ltcClbLGZpbmRDb3JyZWxhdGlvbihjb3IobW9uZXliYWxsX2ltcCkpXSkKdG9fcm0KYGBgCgpgYGB7cn0KYmFzZV9tb2RlbF9ub0NvbCA8LQogIGxtKHRhcmdldF93aW5zIH4gYmF0dGluZ19oICsgYmF0dGluZ18yYiArIGJhdHRpbmdfYmIgKyBiYXR0aW5nX3NvICsgYmFzZXJ1bl9zYiArIHBpdGNoaW5nX3NvICsgZmllbGRpbmdfZSArIGZpZWxkaW5nX2RwLCBkYXRhID0gbW9uZXliYWxsX29yaWcpCiAgcGFyKG1mcm93ID0gYygyLCAyKSkKICBwbG90KGJhc2VfbW9kZWxfbm9Db2wpCiAgc3VtbWFyeShiYXNlX21vZGVsX25vQ29sKQogIHBhc3RlKCdNU0UgZXF1YWwgJywgbXNlKGJhc2VfbW9kZWxfbm9Db2wpKQpgYGAKVGhvdWdoIHRoZSByc3F1YXJlZCB2YWx1ZSB3ZW50IGRvd24sIHRoZXJlIGFyZSBzb21lIGltcHJvdmVtZW50cyBvbiB0aGUgQ29vaydzIGRpc3RhbmNlIGNoYXJ0LgpOb3cgbGV0J3MgdHJ5IHRvIHVzZSB1c2UgdGhlIGBjYXJldGAgcGFja2FnZSB0byBhcHBseSB0aGUgdHJhbnNmb3JtYXRpb25zIHdlIGRpc2N1c3NlZCBlYXJsaWVyIGluIG91ciBleHBsb3JhdGlvbiBwaGFzZS4gIAoKMS4gQ2VudGVyIGFuZCBTY2FsZSB0aGUgZGF0YQoyLiBGaXggdGhlIHRoZSBwcm9ibGVtIHdpdGggb3V0bGllcnMgYnkgdXNpbmcgc3BhdGlhbCBzaWduIFRyYW5zZm9ybWF0aW9uICAKMy4gTGFzdCBidXQgbm90IGxlYXN0IGEgYm94Y294IHRyYW5zZm9ybWF0aW9uIHRvIHRha2UgY2FyIG9mIHRoZSBza2V3bmVzcyAgIApgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFLCBwYWdlZC5wcmludD1GQUxTRX0KdHJhbnMgPC0gcHJlUHJvY2Vzcyhtb25leWJhbGxfaW1wLCBtZXRob2QgPSBjKCJCb3hDb3giKSkKdHJhbnNmb3JtZWQgPC0gcHJlZGljdCh0cmFucywgbW9uZXliYWxsX2ltcCkKaGVhZCh0cmFuc2Zvcm1lZCkKCnRyYW5zX21vZGVsX2FsbCA8LQogIGxtKHRhcmdldF93aW5zIH4gYmF0dGluZ19oICsgYmF0dGluZ18yYiArIGJhdHRpbmdfM2IgKyBiYXR0aW5nX2JiICsgYmF0dGluZ19zbyArIGJhc2VydW5fc2IgKyBiYXNlcnVuX2NzICsgcGl0Y2hpbmdfaCArIHBpdGNoaW5nX2hyICsgcGl0Y2hpbmdfYmIgKyBwaXRjaGluZ19zbyArIGZpZWxkaW5nX2UgKyBmaWVsZGluZ19kcCAgKyBiYXR0aW5nXzFCICsgZnJlZV9iYXNlc19udW0gKyB0b3RhbF9iYXNlcyArIHRvdGFsX2Jhc2VzX2FsbG93ZWQgKyBIUl9vdmVyX09QICsgd2Fsa3Nfb3Zlcl9PUCArIFNPX292ZXJfT1AsIGRhdGEgPSB0cmFuc2Zvcm1lZCkKICBwYXIobWZyb3cgPSBjKDIsIDIpKQogIHBsb3QodHJhbnNfbW9kZWxfYWxsKQogIHN1bW1hcnkodHJhbnNfbW9kZWxfYWxsKQogIAogIHBhc3RlKCdNU0UgZXF1YWwgJywgbXNlKHRyYW5zX21vZGVsX2FsbCkpCiAgCmBgYAoKYGBge3J9CnBhcihtZnJvdyA9IGMoMSwgMykpCmkgPSAyCndoaWxlIChpICVpbiUgYygyOjE3KSkgewogCnBsb3QodHJhbnNmb3JtZWRbLGldLCB0cmFuc2Zvcm1lZCRpbmRleCwgeGxhYiA9IGNvbG5hbWVzKHRyYW5zZm9ybWVkKVtpXSAsIHlsYWIgPSAiSW5kZXgiLCBtYWluID0gcGFzdGUoImNsZXZlbGFuZCBkb3RwbG90IG9mICIsY29sbmFtZXModHJhbnNmb3JtZWQpW2ldKSkKCmJveHBsb3QodHJhbnNmb3JtZWRbLGldLCBjb2wgPSAiI0E3MTkzMCIsIG1haW4gPSBwYXN0ZSgiQm94cGxvdCBvZiAiLGNvbG5hbWVzKHRyYW5zZm9ybWVkKVtpXSkpCgpoaXN0KAogIHRyYW5zZm9ybWVkWyxpXSwKICBjb2wgPSAiI0E3MTkzMCIsCiAgeGxhYiA9IGNvbG5hbWVzKHRyYW5zZm9ybWVkKVtpXSwKICBtYWluID0gcGFzdGUoIkhpc3RvZ3JhbSBvZiAiLGNvbG5hbWVzKHRyYW5zZm9ybWVkKVtpXSkKKQogIGkgPSBpICsgMQp9CmBgYApMb29raW5nIGF0IENvb2sncyBEaXN0YW5jZSwgaXQncyBjbGVhciB0aGF0IHdlIGhhdmUgaW5mbHVlbnRpYWwgZGF0YSwgYnV0IHRoZSBvdGhlciBjaGFydHMgbG9vayByaWdodCB3aGVyZSB0aGV5IHNob3VsZCBiZS4KCkxldCdzIHRyeSwgc3RlcHdpc2UgYXBwcm9hY2guCjEuIEJvdGggZGlyZWN0aW9uCmBgYHtyIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0UsIHBhZ2VkLnByaW50PUZBTFNFfQpzdGVwd2lzZV9iYXNlX21vZGVsX2JkIDwtIHN0ZXBBSUModHJhbnNfbW9kZWxfYWxsLCBkaXJlY3Rpb24gPSAiYm90aCIpCgpwYXIobWZyb3cgPSBjKDIsIDIpKQogIHBsb3Qoc3RlcHdpc2VfYmFzZV9tb2RlbF9iZCkKICBzdW1tYXJ5KHN0ZXB3aXNlX2Jhc2VfbW9kZWxfYmQpCnBhc3RlKCdNU0UgZXF1YWwgJywgbXNlKHN0ZXB3aXNlX2Jhc2VfbW9kZWxfYmQpKQpgYGAKCjIuIEZvcndhcmQgZGlyZWN0aW9uCgpgYGB7ciBtZXNzYWdlPUZBTFNFLCB3YXJuaW5nPUZBTFNFLCBwYWdlZC5wcmludD1GQUxTRX0Kc3RlcHdpc2VfYmFzZV9tb2RlbF9mdyA8LSBzdGVwQUlDKHRyYW5zX21vZGVsX2FsbCwgZGlyZWN0aW9uID0gImZvcndhcmQiKQoKcGFyKG1mcm93ID0gYygyLCAyKSkKICBwbG90KHN0ZXB3aXNlX2Jhc2VfbW9kZWxfZncpCiAgc3VtbWFyeShzdGVwd2lzZV9iYXNlX21vZGVsX2Z3KQpwYXN0ZSgnTVNFIGVxdWFsICcsIG1zZShzdGVwd2lzZV9iYXNlX21vZGVsX2Z3KSkKYGBgCgoKMy4gQmFja3dhcmRzIGRpcmVjdGlvbgoKYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRSwgcGFnZWQucHJpbnQ9RkFMU0V9CnN0ZXB3aXNlX2Jhc2VfbW9kZWxfYncgPC0gc3RlcEFJQyh0cmFuc19tb2RlbF9hbGwsIGRpcmVjdGlvbiA9ICJiYWNrd2FyZCIpCgpwYXIobWZyb3cgPSBjKDIsIDIpKQogIHBsb3Qoc3RlcHdpc2VfYmFzZV9tb2RlbF9idykKICBzdW1tYXJ5KHN0ZXB3aXNlX2Jhc2VfbW9kZWxfYncpCnBhc3RlKCdNU0UgZXF1YWwgJywgbXNlKHN0ZXB3aXNlX2Jhc2VfbW9kZWxfYncpKQpgYGAKCgojIyBDb25jbHVzaW9uCgpJdCBkZWZpbml0ZWx5IG1hZGUgYSBkaWZmZXJlbmNlIHdoZW4gdGhlIHRyYW5zZm9ybWF0aW9uIHdlcmUgYXBwbGllZC4gT25lIGNhbiBzZWUgdGhlIGRpZmZlcmVuY2UgaW4gdGhlIHJlc2lkdWFsIHBsb3RzLiBUaGUgcmVzaWR1YWwgaXMgbm93IG5vcm1hbChwZXIgUVEgcGxvdCksIGFuZCB0aGVyZSBhcmUgbm8gcGF0dGVybnMgd2hlbiB3ZSBsb29rIGF0IGhlIFJzaWR1YWxzIFZzIEZpdHRlZCBwbG90LgpXaGVuIGxvb2tpbmcgYXQgdGhlIFJzcXVhcmVkIGFuZCBBZGp1c3RlZCBSc3F1YXJlZCB0b2dldGhlciB3aXRoIHRoZSByZXNpZHVhbCBwbG90cywgaXQncyBlYXN5IHRvIGNvbmNsdWRlIHRoYXQgdGhlIG1vZGVsIHdpdGggdGhlIHN0ZXB3aXNlIGFwcHJvYWNoIHRvZ2V0aGVyIHdpdGggdGhlIHRyYW5zZm9ybWF0aW9ucyBpcyB0aGUgb25lIHRoYXQgbGVhZHMgdG8gYSBiZXR0ZXIgbW9kZWwuCgpUaG91Z2ggUk1TRSBhbmQgUnNxdWFyZWQgZnJvbSB0aGUgb3RoZXIgbW9kZWxzIHNlZW0gdG8gc3VnZ2VzdCBvdGhlcndpc2UsIHRoZSBzdGVwd2lzZSBtb2RlbCBhcHBlYXJzIHRvIGJlIG1vcmUgc3RhYmxlLiBJIGFsc28gbm90aWNlZCBieSBsb29raW5nIGF0IHRoZSBDb29rJ3MgRGlzdGFuY2UgcGxvdCB0aGF0IHRoZXJlIGFyZSBpbmZsdW5jaWFsIG9ic2VydmF0aW9ucywgYnV0IGZvciBzb21lIHJlYXNvbiBJIGNvdWxkIG5vdCBnZXQgcm9idXN0IHJlZ3Jlc3Npb24gdG8gd29yay4gRnJvbSBteSB1bmRlcnN0YW5kaW5nLCByb2J1c3QgcmVncmVzc2lvbiB3b3VsZCBwdXQgbGVzcyBlbnBoYXNpcyBvbiB0aG9zZSBkYXRhIHBvaW50cywgbGVhZGluZyB0byBhIG1vcmUgYWNjdXJhdGUgbW9kZWwuCgoKCgoKCgoKCgoKCgoK